fix(mcp): harden daemon discovery — identified PONG, port scanning, version takeover, resident-by-default#115
Merged
StefanSteiner merged 7 commits intoJun 7, 2026
Conversation
Liveness checks now send PING and require an identifying 'PONG hyperdb-mcp <version>' reply (verified by exact tokens, not a string prefix), so a foreign process camped on the health port no longer reads as a live daemon. Default base port moves 7484 -> 7485 (7484 is hyperd's conventional gRPC port) and a PortScan resolver is introduced (pin when HYPERDB_DAEMON_PORT is set, else scan span 16). Also fixes a latent bug: the server heartbeat re-resolved the port instead of using the daemon's discovered health_port, which would target the wrong port once scanning lands. The engine now carries daemon_health_port and the heartbeat uses it.
ensure_daemon now scans a port range (PortScan) instead of a single fixed port: it PING-identifies each port, returns the first running hyperdb-mcp daemon (verified via STATUS), and otherwise spawns a fresh daemon on the first connection-refused port. probe_port distinguishes our-daemon / camped-foreign / refused; a process that answers TCP but not the identified protocol is treated as camped and skipped. A newly starting client whose semver is strictly newer than the running daemon takes it over: STOP the old daemon (which drops its HyperProcess and stops hyperd), wait for the health port to release, then respawn on the same port. Equal/older/unparseable versions reuse the daemon — never a downgrade-kill. Adds semver as a direct dependency (already in lock).
Idle shutdown is now opt-in: DaemonConfig.idle_timeout is Option, set only when --idle-timeout or HYPERDB_DAEMON_IDLE_TIMEOUT is provided (flag wins over env). With neither set the idle monitor never arms, so the daemon and its hyperd stay resident — eliminating the connection error + 'hyper restarting, retry' churn a client hit after a 30-min idle shutdown. The hyperd restart-limit shutdown path is unchanged. The daemon CLI --port is now Option<u16>: 'daemon stop'/'status' omit it and resolve the live daemon via find_running_daemon() (discover + scan), so they no longer miss a daemon that scanned onto a non-base port. A bare 'hyperdb-mcp daemon' binds resolve_port_scan().base.
…dent default Update README Operating Modes + CLI reference, DEVELOPMENT daemon internals, and the CHANGELOG Unreleased entry to reflect: identity-checked discovery (PONG hyperdb-mcp <version>), port scanning from 7485 (was fixed 7484, which clashes with hyperd gRPC), newer-client version takeover, and idle shutdown now being opt-in (daemon stays resident by default).
…radeoff Final-sweep follow-ups: - maybe_take_over: before respawning on the freed port, adopt a concurrently-published identity-verified daemon on that same port if one already exists, avoiding a redundant spawn and a stale-endpoint return during simultaneous version takeovers. - Expand the heartbeat comment to explain why the discovered health_port is used instead of re-resolving (scanning can land off the base port). - DEVELOPMENT.md: document the resident-by-default tradeoff for a hung-but-alive hyperd and note a possible daemon-side liveness probe.
The status tool now includes an "engine" block: mode (daemon/local), hyperd_endpoint (the libpq endpoint queries run against), and daemon_health_port (the shared daemon's control/lock port, null in local mode). Previously the endpoint was only reachable by reading ~/.hyperdb/daemon.json or via 'hyperdb-mcp daemon status'.
The test scanned the full port range between two arbitrary OS-assigned
ports. Other tests leak identity-answering HealthListeners on random
high ports for the process lifetime; one could land inside that range
and be returned as Found instead of FreePort (observed on Linux CI).
Narrow the scan to exactly two adjacent ports {base (camped), base+1
(free)}, confirming base+1 is bindable immediately before scanning, so
a leaked listener can no longer fall inside the window.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens the
hyperdb-mcpsingle-instance daemon so clients can't mistake aforeign process for the daemon, the default health port stops colliding with
hyperd's gRPC port, upgrades take effect immediately, and the daemon (plus the
hyperdit owns) stays resident — eliminating the "hyper is restarting, pleaseretry" round-trip clients hit after an idle shutdown.
Resolves #114.
Background
The daemon advertises a TCP health port in
~/.hyperdb/daemon.jsonthat servesas both a single-instance lock and a control channel. Four problems motivated
this work:
discover()trusted a bare TCPconnect()to thehealth port and never sent
PING— any process camped on the port read as alive daemon.
7484, which is hyperd'sconventional gRPC port (
ListenMode::Both { grpc_port: 7484 }) — exactly thekind of process that triggers problem 1.
hyperdlingered until the 30-min idle timeout.hyperd; the nextclient hit a connection error surfaced as "hyper restarting, retry."
Changes
PINGnow repliesPONG hyperdb-mcp <version>.Clients verify the exact tokens (not a string prefix) before trusting a daemon;
is_daemon_aliveuses this instead of a bare TCP connect. A process thatanswers TCP but not the identified protocol is classified as "camped" and
skipped; a stale/foreign
daemon.jsonis detected and removed.resolve_port_scan()returns aPortScan { base, span }: pins the exact portwhen
HYPERDB_DAEMON_PORTis set, else scans 16 ports upward.probe_portclassifies each as
OurDaemon/Camped/Refused; the daemon spawns on thefirst refused port.
daemon status/daemon stoplocate the daemon viadiscovery + scan (CLI
--portis now optional).newer STOPs the old daemon (which drops its
HyperProcess, stopping hyperd),waits for the port to release, and respawns on the same port. Equal / older /
unparseable versions reuse the daemon — never a downgrade-kill.
DaemonConfig.idle_timeoutis nowOption, set onlyvia
--idle-timeout/HYPERDB_DAEMON_IDLE_TIMEOUT(flag wins over env). Withneither set the idle monitor never arms; the daemon and hyperd stay warm. The
hyperd restart-limit shutdown (3 failures / 60s) is unchanged.
statustool surfaces the endpoint. Newengineblock:mode(
daemon/local),hyperd_endpoint, anddaemon_health_port— previouslyonly reachable via
~/.hyperdb/daemon.jsonorhyperdb-mcp daemon status.port, which would target the wrong port under scanning; the engine now records
the discovered
health_portand the heartbeat uses it.Decisions
HYPERDB_DAEMON_PORTset ⇒ pin that exact port (no scan).— e.g. dev rebuilds — are equal and won't take over; use
daemon stopto force).daemon.json,re-spawn.
Known tradeoff
Resident-by-default removes the idle timeout that used to implicitly reap a
hung-but-alive
hyperd. Such a daemon now stays wedged until a client reportsan error (
REPORT_HYPERD_ERROR, fired on client-sideConnectionLost) or anoperator runs
daemon stop. Documented inDEVELOPMENT.md; a future daemon-sideliveness probe could close the "all clients idle + hyperd hung" gap. See #114.
Testing
reject-token-lookalike;
resolve_port_scanpin-vs-scan; scanfinds-via-STATUS / skips-camped / all-refused;
client_should_take_overversion matrix;
DaemonConfig::from_argsnone/flag/env/precedence;statusengineblock.cargo test -p hyperdb-mcpgreen;cargo clippy --workspace --all-targets -- -D warningsclean;cargo fmt --all --checkclean;cargo deny checkclean(adds
semveras a direct dep, already in the lock).statusreportsengine.hyperd_endpointanddaemon_health_port: 7486(scanner correctlyskipped an occupied 7485), version stamp matches the branch HEAD.
Docs
README (Operating Modes + CLI reference), DEVELOPMENT.md (daemon internals +
tradeoff), CHANGELOG Unreleased. Wiki: new [Shared Daemon] page.